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This letter reports two moment extensions of the entropy of a distribution. By understanding the 
traditional entropy as the average of the original distribution up to a random variable transforma- 
tion, the traditional moments equation become immediately applicable to entropy. We also suggest 
an alternative family of entropy moments. The discriminative potential of such entropy moment 
extensions is illustrated with respect to different types of distributions with otherwise undistinguish- 
able traditional entropies. 
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'Looking into each globe, you see a blue city, the model 
of a different Fedora.' (I. Calvino, Invisible Cities) 

Given a statistical distribution p(x), where x is the 
respective random variable, an important problem is to 
try to synthetize its most important features into as few 
measurements i = 1, 2, . . . , N as possible. Although 
a distribution incorporates all information about the re- 
spective random variable x, it typically involves a large 
number of values. While continuous distributions have 
infinite values, discrete distributions typically involve a 
large number of bins. However, the summarization of a 
distribution in terms of a few respective functionals is 
not straightforward and ultimately depends on specific 
goals. For instance, one may be interested in intervals of 
regularity along the distributions, or in the overall dis- 
persion. Generally, it is useful to remove the redundancy 
from the distributions, leaving out only the most informa- 
tive variations and singularities. Traditional functionals 
of distributions include the respective moments given as 



M(p(x),k) = / x k p(x)dx 



I,' 



(i) 



where I is the domain of p(x), i.e. its sampling space. 
Observe that these moments have the same dimensional- 
ity as the original random variable x. The first moment 
corresponds to the average and the second moment is 
related to the variance of the random variable x. It is 
know from statistical theory (e.g. p], Q) that the set of 
all infinite moments can, under certain conditions (the 
so-called moment problem), provide a complete mapping 
of the original distribution, in the sense that the latter 
can be recovered from the former. Generally, increasing 
information about the features of the original distribu- 
tion can be obtained by considering a larger number of 
moments. Another important functional of a statistical 
distribution is its respective entropy (e.g. [!, 0, [Bj] ) , which 
is denned as 



e(p(x)) 



I 



This measurement becomes zero for distributions in- 
volving identical values of x, being maximized for uni- 
form distributions, i.e. identical values of p(x) along 
/. The entropy measurement exhibits several particu- 
larly relevant properties, including its intrinsic relation- 
ship with statistical physics (e.g. p), entropy maximiza- 
tion (e.g. [5|,lZ|), information theory and channel capacity 
(e.g. [5]). The entropy is also invariant to transformation 
of the values of x, i.e. the entropy of the distribution 
of x is identical to the distribution of the new random 
variable y = /(#), where / is any one-to-one function. 
Yet, typically the entropy is considered as an isolated 
measurement. 

In this article we suggest a family of entropy- 
based measurements which provide enhanced informa- 
tion about the original distribution. First, we show that 
the entropy can be understood as a special case of the 
first moment, where the values of the random variable x 
are substituted by the adimensional quantity log(p(x)), 
i.e. the weights in the average definition are exchangee by 
the logarithm of the distribution values. By doing so, it 
becomes possible to calculate all respective moments and 
central moments, which are henceforth called the entropy 
moments and entropy central moments. We illustrate the 
power of such additional statistical measurements with 
respect to the discrimination between important types of 
statistical distributions. 

We henceforth focus our attention on discrete distribu- 
tions represented in the continouous space of the variable 
x, i.e. 



N 



p{x) = p(xi) = ^2p(i)S(xj) 



(3) 



i=i 



p(x)log(p(x))dx 



(2) 



where x is a continuous variable in / = [a, b] and 
S(xi) is the Dirac's delta function placed at x^ i.e. 
5(Xi) = 5 X - X ., with i = 1, 2, . . . , N. Therefore, p{xi) can 
be used to represent any relative frequency histogram. 
The moments of this distribution are immediately given 
by Equation [H 

Now, by introducing the new random variable yi = 
log(p(xi)), we can rewrite the entropy as 



2 



e(p(xi)) = - J p(xi)log(p(xi))dx = - J yip(yi)dy (4) 

where J is the mapped version of the interval /, i.e. 
J = [min(log(p(xi))),max(log(p(xi)))]. Observe that 
p(xi) = p(yi) for any i = 1,2, ...,A/\ We have from 
Equation 0] that the traditional entropy can be under- 
stood as the negative of the first moment (i.e. average) 
of the distribution of the transformed random variable 
Hi = log(p(xi)). The extension to higher order mo- 
ments is straightforward and yields the respective mo- 
ments given by Equation 

ME(p(xO,fc) = - j{yi) k p{yi)dy (5) 
FE(p(xi), k) = log (- J {p{yi)) k yid^j (6) 

We necessarily have that ME(p(^), k) = ME(p(^), k) 
and F~E(p(xi),k) — FE(p(^), k). Observe that the non- 
dimensionality of yi is immediately extended to the en- 
tropy moments. In addition, the consideration of yi as 
the weights for the moment calculation implies the re- 
spectively induced distribution pfjji) to be sorted into 
ascending order. It should be also observe that the suc- 
cessive entropy moments ME tend to present inverse sig- 
nals. Because of the moment mapping theorem, we have 
that all the information in the original distribution p(xi) 
is captured by the infinite set of respective moments. 
Therefore, these additional entropy-based measurements 
provide an interesting complementation of the traditional 
entropy, allowing a more comprehensive characterization 
of the original distribution in terms of a set of respec- 
tive functionals, in direct analogy with the role of the 
traditional moments. The alternative entropy moments 
defined by Equation [6] have been found to allow partic- 
ularly discriminating measurements. In this definition, 
the most external logarithm is used in order to obtain 
more manageable values. 

In the remainder of this article, we provide a series of 
examples of the potential of the entropy moments and 
alternative entropy moments. First, we consider distri- 
butions of the type p(xi) = w exp(ci\ xi = (i — 1)A 
and A = (b — a)/(N — 1), where c is a real value such 
that < c and w is a normalizing constant ensuring 
JjP(xi)dx = 1. Observe that this distribution becomes 
the uniform distribution when c = and the constant dis- 
tribution when c — > oo. Figure [T] illustrates the distribu- 
tion p(xi) = w exp(ci) (a-d) and the normal distribution 
q(xi) = s exp(—0.5((ci — /i)/<r) 2 ) (i-1), where s is a nor- 
malizing constant, as well as the respective transformed 
distributions p(yi) (e-h) and q{yi) (m-p) for several val- 
ues of c, assuming the values of X{ to be distributed at 
equal spaces along / = [0, 1]. Observe that the distribu- 
tion p(xi) tends to become less uniform for larger values 
of c (moving from Fig. []Ji to d), while the opposite is 



verified for q{xi) (moving from Fig. [TJ. to 1). Such trends 
are clearly reflected in the respective entropy values (i.e. 
e (Px) = ME(p(xi), 1)) shown above the respective trans- 
formed distributions in Figure Hfe-h) an d (m-p), respec- 
tively. It is also clear from Figure [IJ particularly for 
the distribution q(xi), that the transformed distribution 
p{y%) is sorted in increasing order as a consequence of the 
random variable transformation yi = log(p(xi)). Observe 
also the increased density of Dirac's deltas at the right- 
hand side of the distributions in Figure [TJ m-p), which 
are a consequence of the similar values of the normal dis- 
tribution q(xi) near its peak. 

Figure [2] depicts the set of alternative moment en- 
tropies FE(p(xi),k) of the distributions p(xi) (a) and 
q(xi) (b) as above, in terms of c for several values of /c, i.e. 
the order of the alternative entropy moments. The points 
where the alternative entropy moments of p{xi) and q(xi) 
equal one another have been marked by the 'vertical' tra- 
jectory. It is clear from these results that though the dis- 
tributions p and q have identical traditional entropy for 
c « 1.06, substantial differences are observed between the 
higher order alternative entropy moments. Interestingly, 
though the first alternative entropy moment (identical to 
the traditional entropy) increases with c as expected, the 
higher order moments tend to decrease with c. 

In order to better illustrate the potential of the entropy 
moments for providing additional information about the 
original distribution, we now focus our attention on the 
two above distributions p{xi) and q{xi) at a value of the 
parameter c at which they can by no means be discrimi- 
nated by considering the respective traditional entropies. 
In order to simulate sampling noise and artifact typically 
implied while measuring the random variable x, we add 
a uniformly distributed perturbation to each of the two 
distributions. Figure [3] shows the histograms of the tra- 
ditional entropies calculated for the two perturbed distri- 
butions. Because of the complete superposition between 
the respective histograms, it is virtually impossible to 
discriminate between the two cases while taking into ac- 
count their respective traditional entropies. 

We now consider the effect of the consideration of addi- 
tional entropy moments on the discriminability between 
the measurements. Figure |4] illustrates the scattering of 
the entropy moments obtained for the perturbed realiza- 
tions of the two types of distributions (i.e. P(Xi) and 
q(xi)) considering 3 (a), 6 (b), 9 (c) and 12 (d) entropy 
moments. The two-dimensional projections shown in 
Figure [H were obtained by using the principal component 
analysis (PC A) methodology (e.g. [1, 0, [Tq[), which en- 
sures maximum dispersion along the first axes of the pro- 
jections, which are defined by the transformed variables 
pccti, i = 1,2, — More specifically, the PCA involves 
the calculation of the covariance matrix of the considered 
measurements and estimation of the respective eigenval- 
ues and eigenvectors. The linear transformation used to 
project the higher dimensional space is defined by the 
eigenvectors of the covariance matrix taken in decreasing 
order. In order to compensate for the largely different 
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FIG. 1: The distributions p(xi) = w exp(ci) (a-d) and q(xi) = s exp(—0.5((ci — fi)/a) 2 ) (i-1) as well as their respective 
transformations p(yi) and q(yi for several values of c. 




FIG. 2: The entropy moments of p(xi) (red) and q(xi) (blue) 
in terms of c for several values of k (shown from top to bot- 
tom). The firsr upper red and blue upper curves (above 0) 
correspond to the traditional entropies of p(xi) and q(xi). 



FIG. 3: The histograms of the traditional entropy obtained 
for the perturbed versions of the two distributions p(xi) and 
q(xi). Because of the complete overlat between these two his- 
tograms, it is completely impossible to discriminate between 
the original distributions while considering their respective 
traditional entropies. 



values of the entropy moments, their values were stan- 
dardized [11] prior to the PC A. It is clear from the results 
shown in Figure [4](a-d) that the incorporation of addi- 
tional entropy moments contributed substantially for the 
separation between the perturbed cases. However, the 
consideration of additional entropy moments tended not 



to enhance such a separation. For instance, the separa- 
tion between the two perturbed distributions considering 
3 entropy moments (Fig. 0^) is similar to that obtained 
for 12 entropy moments (Fig. [4jd) . In addition, the con- 
tribution of the higher order entropy moments had al- 
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most no effect in increasing the separation between the 
two categories of observations while considering the third 
principal component axis, i.e. pca3 (see Figs. Hfe-h). 

Figure [5] shows the PC A results considering alterna- 
tive entropy moments, instead of the entropy moments 
as above. The incorporation of additional alternative en- 
tropy moments allows the increasing discrimination be- 
tween the two sets of observations regarding all the three 
first PC A variables (i.e. peal, pca2 and pca3). 

All in all, we have reported on two families of en- 
tropy moments, obtained by interpreting the traditional 
entropy as the average of a transformed version of the 
original distribution. Such additional measurements have 
been shown to contribute substantially for the character- 
ization of the original distributions, as clearly illustrated 
for a case involving two distributions with undistinguish- 
able traditional entropies. Because of the key role played 



by entropy in so many areas, the concepts and results de- 
scribed in this work have several immediate implications. 
Among the several possibilities for future developments, 
we have the investigation of entropy central moments, 
including the development of a PC A methodology based 
on the respectively implied entropy covariance matrix. It 
would also be interesting to investigate the type of dis- 
tribution features which lead to extreme values of each 
of the entropy moments. 
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FIG. 4: The scattering of the two categories of perturbed distributions as revealed by two-dimensional projection (through 
PCA) of respective characterizations incorporating 3 (a), 6 (b), 9 (c) and 12 (d) entropy moments. 
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FIG. 5: The scattering of the two categories of perturbed distributions as revealed by two-dimensional projection (through 
PCA) of respective characterizations incorporating 3 (a), 6 (b), 9 (c) and 12 (d) alternative entropy moments. 



